I will look into different integration methods to asses batch effects between the liver 20Jan21 sample and the two Liver 2Jul21 samples
First This is how the UMAP looks without any kind of Integration
Dll4/Myc KO and the triple mutant look a bit further and that can be due to the lower sequencing depth these two conditions have
You can clearly see that there is a pattern of higher expressing cells going to the right, and these two conditions cluster to the left. So there may be some technical biases to adress
A first Control analysis will be to add the Control(4d) cells into this whole dataset and see how they cluster, if the go on top of Ctrl 2 week cells or not.
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 7670
## Number of edges: 287676
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8948
## Number of communities: 7
## Elapsed time: 1 seconds
Now if we look at how the different Conditions are distributed, we can see that Control(4d) does not overlap with Control 2 weeks. We have a batch effect here, mainly because of the sequencing depth issues.
First I will try doing the integration considering all 7 conditions. I will use a standard Seurat method for that.
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 7670
## Number of edges: 233295
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8820
## Number of communities: 8
## Elapsed time: 0 seconds
The Integration by Condition is way too harsh. Doing it by Condition is not the way to go, I will have to do it by Seq Depth
That means that I will split this big dataset into two, the old sequencing with high depth, and new sequencing with low depth.
For that I will create a new category “SeqDepth” and split this dataset into “High” and “Low”
I created a new category “SeqDepth” and split this dataset into “High” and “Low”. Then I ran an Integration in terms of those two categories
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 7670
## Number of edges: 294710
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9004
## Number of communities: 7
## Elapsed time: 0 seconds
It works much much better this way, however we lose again AV zonation
Now that we see that this is the strategy to go, I will do the Integration in terms of Sequencing Depth but taking out the Control(4d) group
Integration must be done in terms of Sequencing Depth, not Condition, as it better addresses batch effects
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19043)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C LC_TIME=Spanish_Spain.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] patchwork_1.1.1 yaml_2.2.1 rmarkdown_2.11 dplyr_1.0.7 ggplot2_3.3.5 SeuratObject_4.0.4 Seurat_4.0.5 knitr_1.36 BiocStyle_2.18.1
##
## loaded via a namespace (and not attached):
## [1] Rtsne_0.15 colorspace_2.0-2 deldir_1.0-6 ellipsis_0.3.2 ggridges_0.5.3 spatstat.data_2.1-0 farver_2.1.0 leiden_0.3.9 listenv_0.8.0 ggrepel_0.9.1 RSpectra_0.16-0 fansi_0.5.0 codetools_0.2-18 splines_4.0.3 polyclip_1.10-0 jsonlite_1.7.2 ica_1.0-2 cluster_2.1.2 png_0.1-7 uwot_0.1.11 shiny_1.7.1 sctransform_0.3.2 spatstat.sparse_2.0-0 BiocManager_1.30.16 compiler_4.0.3 httr_1.4.2 Matrix_1.3-4 fastmap_1.1.0 lazyeval_0.2.2 later_1.3.0 htmltools_0.5.2 tools_4.0.3 igraph_1.2.9 gtable_0.3.0 glue_1.5.1 RANN_2.6.1 reshape2_1.4.4 Rcpp_1.0.7 scattermore_0.7 jquerylib_0.1.4 vctrs_0.3.8 nlme_3.1-153 lmtest_0.9-39 xfun_0.26 stringr_1.4.0 globals_0.14.0 mime_0.12 miniUI_0.1.1.1 lifecycle_1.0.1 irlba_2.3.3 goftest_1.2-3 future_1.23.0 MASS_7.3-54 zoo_1.8-9 scales_1.1.1 spatstat.core_2.3-2 promises_1.2.0.1 spatstat.utils_2.2-0 parallel_4.0.3 RColorBrewer_1.1-2 reticulate_1.22 pbapply_1.5-0 gridExtra_2.3 sass_0.4.0 rpart_4.1-15 stringi_1.7.6 highr_0.9 rlang_0.4.12 pkgconfig_2.0.3 matrixStats_0.61.0 evaluate_0.14 lattice_0.20-45 ROCR_1.0-11 purrr_0.3.4 tensor_1.5 labeling_0.4.2 htmlwidgets_1.5.4 cowplot_1.1.1 tidyselect_1.1.1 parallelly_1.29.0 RcppAnnoy_0.0.19 plyr_1.8.6 magrittr_2.0.1 bookdown_0.24 R6_2.5.1 generics_0.1.1 withr_2.4.3 pillar_1.6.4 mgcv_1.8-38 fitdistrplus_1.1-6 survival_3.2-13 abind_1.4-5 tibble_3.1.6 future.apply_1.8.1 crayon_1.4.2 KernSmooth_2.23-20 utf8_1.2.2 spatstat.geom_2.3-0 plotly_4.10.0 grid_4.0.3 data.table_1.14.2 digest_0.6.29 xtable_1.8-4 tidyr_1.1.4 httpuv_1.6.3 munsell_0.5.0 viridisLite_0.4.0 bslib_0.3.1